Skip to content

fix: treat -0.0 and +0.0 as equal in comparisons and IN list#22628

Open
sweb wants to merge 2 commits into
apache:mainfrom
sweb:fix-negative-zero-equals-zero
Open

fix: treat -0.0 and +0.0 as equal in comparisons and IN list#22628
sweb wants to merge 2 commits into
apache:mainfrom
sweb:fix-negative-zero-equals-zero

Conversation

@sweb
Copy link
Copy Markdown
Contributor

@sweb sweb commented May 29, 2026

Which issue does this PR close?

Rationale for this change

Per IEEE 754 default semantics, -0.0 == +0.0 (and -0.0 < +0.0 is false). PostgreSQL, DuckDB, and Python all follow this. DataFusion currently treats -0.0 as strictly less than +0.0 because arrow-rs' comparison kernels intentionally use totalOrder semantics. This produces surprising results in WHERE filters, IN lists, and IS [NOT] DISTINCT FROM, especially when -0.0 is produced by arithmetic on a column (e.g. x * -1 where x = 0.0).

See also https://github.com/apache/arrow-rs/blob/58.3.0/arrow-ord/src/cmp.rs#L66-L80

This was debugged, replicated and further explored using Claude Code. However, the result was adjusted and further improved.

What changes are included in this PR?

  • Add normalize_neg_zero / normalize_neg_zero_array / normalize_neg_zero_scalar in datafusion-physical-expr-common::datum. These rewrite -0.0 to +0.0 for float inputs and pass arrays through unchanged (no allocation) when no -0.0 is present.
  • Apply the normalization in apply_cmp so all comparison operators (=, <>, <, <=, >, >=, distinct / not-distinct, like / ilike) inherit IEEE 754 zero semantics.
  • Apply it in InListExpr for both the dynamic comparator path and the per-list-expression normalization. For the primitive static filter (which hashes via OrderedFloat), inserting 0.0 now also inserts -0.0 (and vice versa) so set membership matches the normalized comparison semantics.

Are these changes tested?

Yes:

  • Unit tests in datum.rs cover float normalization and check for passthrough when no -0.0 is there, also dictionaries.
  • New sqllogictest for this particular case - IEEE 754 may be problematic in other places as well that this PR does not touch.

Are there any user-facing changes?

Yes, comparisons against -0.0 change with this.

Since this introduces an extra step for comparisons, this also has performance implications - I tried to reduce this by checking for float first and it reduces the performance hit but I did not get very consistent benchmark results. Please check whether IEEE 754 behavior for -0.0 is desirable for DataFusion and whether this line of implementation fits.

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFusion evaluates -0.0 >= 0.0 as false

1 participant